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Abstract — We present a particle filter construction for a system 
tliat exliibits time scale separation. The separation of time 
scales allows two simplifications that we exploit: i) the use 
of the averaging principle for the dimensional reduction of 
the dynamics for each particle during the prediction step and 
ii) the factorization of the transition probability for the Rao- 
Blackwellization of the update step. The resulting particle filter 
is faster and has smaller variance than the particle filter based 
on the original system. The method is tested on a multiscale 
stochastic differential equation and on a multiscale pure jump 
diffusion motivated by chemical reactions. 

Index Terms — particle filter, multiscale, dimensional reduction, 
variance reduction, Rao-Blackwellization, stochastic differential 
equations, jump Markov processes 

I. Introduction 

The field of dimensional reduction has seen a flourishing 
in the last decade (see e.g. the reviews in ([Tl, Q), mainly 
due to i) the realization that many systems of physical interest 
are more complex than one can handle even with the largest 
available computers and ii) the fact that for many complex 
systems the quantities of interest are coarse-scale features. 
Once a reduced model is constructed, it can be used in 
conjunction with filtering algorithms, like particle filters |3|, to 
incorporate information from real-time measurements. If the 
system under consideration exhibits time scale separation, the 
construction of a reduced model and subsequently of a particle 
filter, is also simplified. In this work we present a particle filter 
which exploits this simplification to create a particle filter that 
is more efficient than the particle filter of the original (large 
dimensional) system. The particle filter we construct is proved 
to converge to the analytical filter of the original system. 

A strategy for reducing the number of unknowns in an 
application of a filter for nonlinear dynamical systems is 
proposed in [41. In that work the authors assumed that the 
dynamics are strongly locally contractive in some directions 
which are found adaptively. Here we make an alternative 
assumption. We instead assume that the system exhibits a time 
scale separation. By this we mean that certain components or 
modes of the system tend to move very slowly in comparison 
with the rest of the modes. In particular we are interested in 
approximating conditional expectations the form 

E[/(X,J|{Zi,...,Zfe}] 

where Z}^ are noisy observations of some multiscale Markov 
process Xt at discrete times si, . . . , sat. For a very general 
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definition of a multiscale Markov process see fSl, fSl and fT\. 
We will illustrate the method presented below for the par- 
ticular case that Xt is the solution to a stochastic differential 
equations with time scale separation (see equation ( |III.l| i). The 
basic idea, however, can be applied to filtering problems for a 
variety of multiscale Markov process for which fast multiscale 
integration methods are available (see 13, S, HH, lfT2l . 
ifTSll . 114 J, |7| and the references therein). In section [IV] our 
filtering method is appUed first to the reconstruction of the 
trajectory of a multiscale stochastic differential equation and 
second to the reconstruction of a trajectory of a pure jump 
Markov process motivated by chemical reactions that takes 
place on vastly different time scales. 

Loosely speaking, recursive estimation of conditional expec- 
tations of the form above can be accomplished in two steps. 
First, in the prediction step, the system is evolved according 
to its evolution law. Second, in the update step, the resulting 
samples of the system are weighted by the likelihood of the 
next observation given the sample. We describe the filtering 
problem in more detail in the next section. The separation of 
time scales facilitates the construction of an efficient particle 
filter in two ways: i) it allows for fast evolution of the system 
in the prediction step and ii) it allows the integration of the 
observation weights over the fast modes of the system during 
the update step. Step ii) amounts to a Rao-Blackwellization of 
a standard particle filter estimator. 

Multiscale phenomena have been observed in wide ranging 
areas of research. For example, empirical evidence from 
a study of exchange rate dynamics in ifTSl . suggests the 
use of stochastic volatility models with both fast and slow 
time scales. Other examples of systems with scale separation 
include chemical reaction systems where there can exist a 
difference of several orders of magnitude among the differ- 
ent reaction rates (see lIHI, |[13J, L16j|, Ull, US)). Similar 
problems exist in material science (see lfT9l ) and molecular 
dynamics simulations (see |^0l) where one is interested in 
large scale features of the system but this behavior depends 
critically on the small (and fast) scale motion. An even more 
challenging problem is that of weather prediction and how to 
assimilate (through filtering) the vast amount of measurements 
collected daily around the globe. The weather system exhibits 
an extremely large range of active scales not necessarily 
with clear time scale separation (see ||2TI . Il22l ). A projection 
formalism framework, for the construction of reduced models 
of large systems with or without time scale separation, has 
been presented by Chorin and co-workers (see |23|). 

The main difficulty presented by multiscale phenomena is 
that they are extremely costly to integrate. The vast majority 
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of computational time is spent evolving the fast components of 
the system while one may be primarily interested in slow scale 
characteristics. This implies that the prediction step in any 
filtering method which does not take directly into account the 
multiscale structure of the problem will become prohibitively 
costly as the time scale separation increases. As described 
in detail below, our method addresses this issue through its 
incorporation of the multiscale integration scheme. 

A second issue, common to importance sampling techniques 
for high dimensional problems, is controlling the variance 
of the resulting estimator To address this in the context of 
particle filtering, many authors have suggested the use of some 
form of Rao-Blackwellization (see IMl, lES, Hg), (HT), ||28l . 
||29l for example). The distribution of the underlying Markov 
process given current and past observations can always be 
factored into a posterior marginal distribution for some set 
of (in our case slow) variables and a posterior conditional 
distribution for the remaining (in our case fast) variables 
given the first set of variables. Rao-Blackwellization requires 
that expectations with respect to the resulting posterior con- 
ditional distribution can be carried out exactly. In the case 
that the posterior conditional distribution can be sampled, 
the Rao-Blackwellization procedure can be approximated by 
Monte Carlo averaging over these samples. As discussed in 
detail later, the separation of scales assumption made in this 
paper allows an approximate factorization of the posterior 
distribution in which samples from the posterior conditional 
distribution can be easily and efficiently generated. While 
our assumption on the system clearly restricts the class of 
possible applications, another advantage of our setup is that the 
posterior conditional distribution of the unresolved variables is 
allowed to be quite general (for example very non-Gaussian). 

A method closely related to the one presented here was 
recently proposed in ll30l . There is, however, an important 
difference. In |30l it is assumed that the observations and 
the objective function (/ in the conditional expectation above) 
depend directly only on the slow modes in the system. Here we 
allow for general observations and general objective functions. 
This fact is central to the utility of our algorithm. The method 
proposed here is designed to not only improve the efficiency 
of the prediction step of a particle filter, but also to reduce the 
number of particles required to achieve a given accuracy. In 
fact, in some problems with a moderate time scale separation, 
one may not observe any gain in efficiency in the prediction 
step, while the reduction in variance may be significant. It 
should also be noted that analytical results for continuous time 
multiscale filtering problems have been obtained by Kushner 
in Hm. 

The paper is organized as follows. Section |ll] recalls the 
well known particle filter construction for a general Markov 
process which is observed with noise at a discrete set of 
instants. Section III presents our particle filter construction 



in the particular case of a multiscale stochastic differential 
equation. Section III-A discusses how the presence of a 



separation of time scales can lead to a construction of a 
reduced model for the slow components of the system and 
how it allows the Rao-Blackwellization of the filtering step. 
It also includes a particle filter construction which is based 



on the assumption that the reduced model can be constructed 
analytically and the Rao-Blackwellization of the filtering step 



can be performed analytically. Section III-B contains the main 
algorithm, which approximates the particle filter presented 



in Section III-A when the necessary calculations cannot be 



performed analytically. Section IV-A contains numerical re- 
sults for a system of stochastic differential equations. Section 



IV-B contains numerical results for a pure jump type Markov 



process motivated by multiscale chemical reactions. Finally, 
Section [V] contains a discussion of the algorithm and of the 
results. 

II. The filtering problem and particle filters 

We start by formulating the filtering problem. Assume 
that Xt is a d-dimensional Markov process with transition 
probability Qas{x, dx') where QQ{dx) is a known distribution 
on WK We assume for notational convenience, that the process 
is autonomous. The process Xt is usually called the hidden 
signal. Assume, also, that we have noisy discrete observations 
at N regularly spaced times of length As — Sk — Sk-i for 
k = 1, . . . ,N where sq — and Sk the time of the z-th 
observation. The observations satisfy. 



Zk = G{Xs^,Xk), k=l, 



where the Xk are i.i.d random variables, independent of Xt, 
and for all x the variable Z = G{x, x) admits a density 
z g{x, z) which is known. Let 1 : k denote the sequence 
{ 1 , . . . , fc} for k = 1 , . . . , A^. The filtering problem consists 
of computing the conditional expectations 



nfe/ = E[/(X,J|Zi:fc] forfc = l, 



where / belongs to some reasonable family of functions. The 
quantities 11^/ constitute the filter (called the analytical filter). 
The conditional expectation 11^;/ can be written as 



nfe/ = 



/ QQ{dxo)QAs{xo,dxsi)g{xs^,zi) 
J QQ{dxo)QAs{xo,dxsJg{xs,_, zi) 

QAaiXs,,_^,dx,^)g{Xs^,Zk)f{x.,J 
QAs{Xs^_,,dXsJg{Xs^,Zk) 



(II. 1) 



where / stands for integration over the variables 
Xsg , Xg-^ , . . . , Xgf. ■ Let Hk be the kernels, 

Hkf {xs,_,,zk) = J QAs{xs,_,,dxsJ 

X g{xs^,Zk) f{xsj dxs^. (II.2) 
The filter ( |II.l[ i can be written recursively as 

Uof = f fix) Qoidx), nfc/= ""'^^"{ . (II.3) 



Hk-iHk 1 

Usually, the integrals in ( |II.3| l cannot be computed analytically. 
A particle filter ll32l . p| is an approximation to the analytical 



filter (II. 3 I. In its simplest form, a particle filter consists of the 



following steps: 

Algorithm II.l Standard particle filter 

1) Sample n i.i.d. vectors (particles) Xq, j — l,...,n 
from Q()(dx). Set k = I. 
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2) For j = l,...,7i evolve Xl^__^ according to 
QUXi,_„dx') to get X% 

3) Calculate 



1 " 

1 " 

i=i 

4) Define the probability measure ^'JJ on with density 

1 " 

5) If k < N: Set k ^ k + I. Sample n i.i.d. variables 
Xl^ , j = 1, . . . , n from 5'^, awt/ go fo Step 2. 

We stop at the end of iteration N, and our approximation of 
Ilfc / is, 

wr f 
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III. Filtering for systems with time scale 

SEPARATION 
A. Scale separation and Rao-Blackwellization 

Many problems in the natural sciences give rise to systems 
with time scale separation. These systems represent a chal- 
lenge for numerical simulations. For example in molecular 
dynamics simulations femtoseconds timesteps are required 
to integrate the fastest atomic motions, while the object of 
interest is of orders of microseconds or longer. In the past 
four decades systems with scale separation have been the focus 
of extensive research within the framework of the averaging 
principle. The separation of scales is utilized to derive an 
effective model for the slow components of the system. While 
the averaging principle and its resulting effective dynamics 
provide a substantial simplification of the original system 
it is often impossible or impractical to obtain the reduced 
equations in closed form. This has motivated the development 
of algorithms such as multiscale integration methods described 
in the next section. 

To make the presentation more concrete we will describe the 
situation for a li-dimensional systems of stochastic differential 
equations with multiple time scales (see |33| Chapter 7). As 
mentioned in the introduction, the ideas that we will discuss 
can be applied to filtering problems for any multiscale Markov 
process for which multiscale integration methods are available. 

Let {Xl,Y^) be a solution of the system 

dXl = a{Xl, r/) dt + h{XI) dUt, X^ = xo 



(III. la) 



dn^-a{x^,Yndt+ 

e 



1 



f3{XlYt')dVt, Y„'^yo- 



(III. lb) 

where Ut and Vt are independent Brownian motions. The 
hidden variable {X^,Y^'^) is a Markov process with transi- 



dx + dy = d). The Xf variables evolve on a 0(1) time scale 
(the macro time scale), and the Y^ variables evolve on an 
0(e) time scale (the micro time scale). As above, we have 
noisy discrete observations. 



Zk = G{Xl.Y:.Xk). = 



where the Xk are i.i.d variables, independent of x,y, and for 
all a;, y the variable Z = G{x, y, x) admits a density z 
g{x, y, z) which is known. 

The standard approach to this filtering problem 
(see ||34ll ) is to use an easily sampled approximation 
Q'^^* {{x,y) ,{dx' ,dy')) of the transition probability 
Q'2^^{{x,y),{dx',dy')) where the discrete time step 6t 
is of scale comparable to e. If one is interested in the 
evolution of the system over 0(1) time scales then one 
must evolve the system for O(^) steps, which can be very 
costly for systems with large scale separation. In addition to 
simulation issues inherent to multiscale phenomena, particle 
filters can suffer from all of the usual difficulties associated 
with importance sampling in large dimensional spaces. That 
is, in high dimensional systems, the variance of the particle 
filter estimator can be difficult to control. 

In this section we show how the averaging principle and 
Rao-Blackwellization can be used to reduce the computational 
effort for each particle and the number of required particles. 
The method we discuss assumes that the Rao-Blackwellization 
and the construction of the dimensionally reduced model 
can be performed analytically. Unfortunately, this is rarely 
the case. In the next section, we show how the multiscale 
integration framework can be used to implement our approach 
when we are not able to perform the Rao-Blackwellization and 
the construction of the reduced model analytically. 

Under appropriate assumptions (see [35J, ||^, ||6l, |f33l ), the 
averaging principle dictates that as e ^ 

XI ^ Xt fori e [0,T], 
where Xt satisfies 

dXt = a{Xt) dt + h{Xt) dWt. 
The averaged coefficient a is given by 



(111.2) 



a{x) 



a{,x,y)nx{dy) 



(III.3) 



where fi^ {dy) is the invariant measure induced by ( |III.lb| l with 
the x variables fixed. 

The key consequence of the time scale separation is that, 
loosely speaking, for e small enough and for As ^ e, the tran- 
sition density Q'^^g{{x,y),{dx' ,dy')) can be approximately 
factored as 



lAs 



{{x,y),{dx',dy')) 



{x,dx')fix'{dy) (III.4) 



tion probability Q'^^^{{x,y),{dx' ,dy')) on 



Mi 



(with 



where Qas is the transition density for the averaged system, 
( |III.2| l. The factorization ( |III.4[ ) above is the central tool in the 
construction of the multiscale particle filter below and is not a 
feature limited to multiscale stochastic differential equations. 
The algorithms below apply to any problem for which this 
approximation holds and for which some means of sampling 
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(or approximately sampling) QAs{x,dx') and fi^'idy) are 
available. This is the case for both of the examples in Section 

El 



The relation ( |in.4| i suggests that an approximate particle 
filter can be constructed by defining the kernel, 



X fJ'x.^idy) g{xs^,y,zk) f{xs^,y). (III.5) 
The corresponding particle filter would proceed as follows, 

Algorithm III.l Standard particle filter corresponding to 

dnoj i. 

1) Sample n i.i.d. vectors (Xq,Yq) j — l^...,nfrom 
Qo(dx, dy) distribution. Set k — I. 

2) For each use ( |III.3| l to evolve X^^__^ to X'^^'' , i.e., 
sample from QAs{X^^__^,dx'). For each sample X'^J, 
generate an independent sample Y''' from the measure 

fix' ^- 

3) Calculate, 

1 " 
1 " 

Win^-Y,g{X'J,Y'\zk). 
n ^ — ' 

4) Define the probability measure ^'^ on with density 

^?.(dx,dy) = — — 

71 

X J2 9{X'J,Y'\zk) Sx'^^.idx) 5y.,{dy). 

5) If k < N: Set /c — > A: + 1. Sample n i.i.d. variables 
Xl^ , j — 1, . . . ,n from vp^, and go to Step 2. 

We stop at the end of iteration N, and our approximation of 
n/c / is. 



There is no reason to hope that this algorithm should provide 
any variance reduction. Indeed, in the ideal case that expres- 
sion ( |III.4| i is an equality, the variance of the particle filter 
would be identical to the variance of the standard particle filter 

uni 

However, knowledge of /x^.^^ (dy) allows one to integrate out 
y in expression ( |ni.5| l and write this same kernel in another 
form. 



Hkf {x,,^,,Zk) ^ J QAs{xs,_,,dxs^) 

X J lJ'x,^{dy) g{x,s^,y,Zk) f{xs^,y). (111.6) 

This suggests a different filtering algorithm: 

Algorithm III.2 Rao-Blackwellized particle filter 
corresponding to ( |III.6| l. 



1) Sample n i.i.d. vectors {Xq,Yq) j ~ l,...,nfrom 
QQ{dx, dy) distribution. Set fc = 1. 

2) For each j, use ( |III.3| l to evolve X^^__^ to X[.^'' , i.e., 
sample from QAs{Xl^_^,dx'). 

3) Calculate the following quantities: 

a) 



{x'J) 



fiX'J,y) 
X g{X' \y,Zk)iix' ^{dy) 
9{X'\y,Zk)fix',:i{dy)- 



b) 



1 " 
1 " 

-J2[^g]iX'J). 



4) Define the probability measure ^'^ on M.'^ with density 

nidx,dy) = ^^ 

n 

X V g{X' \y,Zk) 5x' ,{dx) Hx' ^{dy). 

5) If k < N: Set k —> k + 1. Sample n i.i.d. variables 
Xl^ , j = 1 , . . . , n from and go to Step 2. 

We stop at the end of iteration N, and our approximation of 
II/c / is. 



Win 



(111.7) 



Since the y components of the resampled particles after Step 
4 are not used in Step 2, we can, in practice, resample from 
the marginal density 

In order to illustrate the variance reduction aspects of 



algorithm III. 2 we will consider the asymptotic variance of 



a pair of related but much simpler estimators. Define the 
conditional expectations Ilk recursively by. 



no/= / fix) Qoidx), nfe/ = 



Let 



and 



hT:;=^f{XlY^g{XlYl,Zk) 



kJ:U9{Xi,Yi,z^ 



where {Xj.^Yl) are i.i.d. samples from the measure 
n/c-iQAsA* Notice that the only difference between /^/ 
and described in Algorithm ( |III.5| l is that the samples 
{X-^^Y^) are independently drawn from tlk-iQAslJ- instead 
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of its particle approximation ^5J_iQasM- ^ corresponding 
relationship holds between IJ^f and ^5?/. 

The estimators IJ^f and IJ!/ satisfy Central Limit Theo- 
rems, i.e. 

(111.9) 



and 



{I^f - Hfc/) ^ (0, a^) . (III. 10) 

An application of the delta method (see [36 1) yields that 

^2 / 9{X,Y) 



(/(x,y)-nfe/)^ 



and 



V N] (^) 

can be rewritten as 



nfe-ii?a 



(^) 



Therefore, by Jensen's inequality, we have that 



/.9 - .gHfc/ 



It is important to note that the variance reduction offered by 
Algorithm |III.2| does not require any Gaussian or degenerate 
measure approximations. The main assumptions are that the 
system have a multiscale structure and that one can sample 
from QAs{x,dx') and evaluate averages with respect to the 
ergodic measure fj.^ (which we assume exists). In many cases, 
for example those in Section |IV] the first assumption can be 
easily verified. For most general systems, QAs{x,dx') is not 
available in closed form. In practice, we must replace Qas by 
some approximation. For example in the case of xt above 
we can use the transition probability kernel for the Euler- 
Maruyama scheme with step size At, Q^l- Of course to apply 
the Euler approximation to ( |III.2[ ) we must be able to exactly 
evaluate averages with respect to fij.. The removal of this 
assumption will be addressed in the next section. 

B. Implementation of the multiscale integration for the re- 
duced particle filter 

In the particle filter construction of the previous section we 
used the fact that we can average over the invariant measure 
induced by the fast variables. This is usually impossible 
since the invariant measure is unknown or because integration 
cannot be performed analytically. We will demonstrate that 
this problem can be overcome by using multiscale integration 
schemes (see H), Q, GOl, fHl, HSl, El, El)- In the 
following description we will be discussing specifically the 
multiscale integration schemes of the form analyzed in ifTOl 
and ifTTl . The system studied in section IV-B is a pure jump 



Markov process and therefore requires a different multiscale 
integration scheme (see fT2ll . lfT3l ). Let At be a fixed time 
step, and Xk^i be the numerical approximation to the coarse 
variable, X from the previous section, at time tk^i ~ Sk + 1 At 



(recall Sk is the fc-th observation time). Assume for simplicity 
that L = ^ is an integer Inspired by the limiting equation 
( |III.2| i, Xk i is evolved in time by an Euler-Maruyama step, 

Xk,i+i^ Xk^i+A{Xk,i)At + biXk,i)AWt,, (IILll) 

for I = 0, . . . ,L — 1, where AWt^. , are Brownian displace- 
ments over a time interval At. We refer to ( |III.l l| l as the 
macro-solver, or macro integrator, and we denote its transition 
probability by Q^*'''*'*^(a;, dx'). Notice that with this notation 

Xk.O = Xk-l,L- 

The function A{x) approximates d{x), introduced in the 
previous section, which is the average of a over an ergodic 
measure. The ergodic property implies that instead of ensem- 
ble averaging we can use time averaging over trajectories of 
the rapid variables with fixed x. Since, by assumption, this 
average cannot be performed analytically, it is approximated 
by an empirical average over short runs of the fast dynamics. 
These "short runs" are over time intervals that are sufficiently 
long for empirical averages to be close to their limiting en- 
semble averages, yet sufficiently short for the entire procedure 
to be efficient compared to the direct solution of the coupled 
system. 

Thus, given the coarse variable at the k, l-th time step, Xk.i, 
we take some initial value for the fast component Y^.i^, and 
solve ( |III.lb| i numerically with step size 6t and x — Xk,i 
fixed. We denote the discrete variables associated with the fast 
dynamics at the k, l-th coarse step by Yk.i^m, to = 0, 1, ... , M. 
The numerical solver used to generate the sequence Yk.i,m 
is called the micro-solver, or micro-integrator. The simplest 
choice is again the Euler-Maruyama scheme. 



Yi 



1 



k.l: 



Yk.i 

+ ^P{Xk,l,YkJ,m)AVk,l, 



(III. 12) 



where AVk^i.m are Brownian displacements over a time inter- 
val St. In this equation Xk^i is a parameter in Yk.i^m though 
this will not be explicitly written. Since we assume that 
the dynamics of the y variables is ergodic, we may choose 
^k.i.a = ^fc,;-i,m for all k and /. For convenience however, 
we will set Yk^i.o ~ 0. As for the X variables, our notation 
implies Yfe,o,m ^fe-i.^^m for all to. 

The existence, under appropriate assumptions, of an invari- 
ant measure, /i**, of the numerical scheme ( |III.12| l, follows 
from results in ||37l . ||38|. The measure /^** is an approximation 
to fix. This suggests estimating the function a by 



A{Xk,i) 



1 



M 

M ^ 

rn — l 



a{X]~^l,Yk^l^m)- 



(III. 13) 



Since we will frequently encounter this form of trajectory 
averaging in the sequel, we define the symbol 



M 



1 



M ^ 



where h is some function defined on R'^y . We will henceforth 
omit from our notation, the dependence of S^h on the 
variables {yi, . . . , ym)- Equations pi.ll| ), piI.12| i, and pi.l3| ) 
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define the multiscale integration sclieme. We note here that, 
in fact, since for our filtering application of the multiscale 
integration scheme we are interested in reproducing the dis- 
tribution of the process ( |III.l| i and not actual trajectories, we 
can replace the average in ( III.13| l by evaluation at the single 
point Yk^i.M- Since expression ( III.13| l is only marginally more 
expensive and corresponds more naturally to the method of 
averaging, we will not bother with this simplification. 

Suppose that the joint distribution of Yj. ; „ given that 
Xk,i = X and Yk,io = is Qi^^^\dyi, . . . ,dyM)- We can 
define the measure by. 



M 

M ^ 

m—l 



where the subscript x on the expectation emphasizes the 
dependence of the random variables Yk^.m on the parameter 

X. 

By using trajectory averages over the fast dynamics to 
approximate integrals over ii^ of the observation density as 
well as the coefficient a we can define a particle filter which 
approximates the Rao-Blackwellization step of the previous 
section. The following kernel is an approximation of expres- 
sion ( |III.6| ), 



ryAt,St,M J. / \ 



^ As V-'-Sfc-i : ""^Sfc ; 



X ^^xlA^y) f(^'^K^^y)9ixs^,y,zk). (III. 14) 

The next particle filter, defined in the following algorithm, 
corresponds to ( |III.14[ ) and differs from Algorithm III. 2 in that 



we evolve the particles according to Q^* ''*'*^ instead of Qas 
and, in the update Step, instead averaging over the measure 
fix we average over a trajectory of Y^j 



Algorithm III.3 Multiscale particle filter 

1) Sample n i.i.d. vectors j 
Qo(dx, dy). Set k — \. 

2) For each j = 1 , . . . , n evolve 



, n from 



XL. 



according to Equations 1 
generate samples X'^.'' — X'Li l 



IIU2\ and \IIU3\ to 



i.e. sample from 

Q^f^''^\X-l_i,dx'). We use Y'Liim denote the 
fast variables evolved according to Eq. \nL12\ with 
parameter X'l._^ To initialize the Y'"' variables at 
each time Step of Eq. {III. 11 \ we choose, Y'j. i q — 0. 
3) For each j 



1, 



a) Evolve Y = according to (111.12) with 



parameter X'ff to get Y''' 



b) Calculate 



k,0.; 



M 



where 1 < m < M. 



^9iX'i,Yi,„^,Zk) 



1 M 



c) Calculate 



1 " 

with density 



4) Define the probability measure ^PJJ on 



Mn WJ^ 1 



n M 

EE 

j — l m—l 



9{X'k\Y,^ 



13 

k ' ^ fe,0,; 



^,Zk)5x'3{dx)5Y3 {dy) 



5) If k < N: Set fc — > k+1. Sample n i.i.d. variables Xj,, 
j — I, . . . ,n from ^P^J, and go to Step 2. 
We stop at the end of iteration N, and our approximation of 
Ilfc / is, 

Wl}f 



qjAt,5t.,M,n r 
k J 



Since the y components of the resampled particles after Step 
4 are not used in Step 2, we can, in practice, resample from 
the marginal density 



At.St,AI.i 



(dx) 



(dx). 



(III. 15) 

The two procedures give equivalent estimates and differ only 
in that the work required for the resampling Step using 
( |III.15| ) will scale with n instead of nM (of course calculating 
the averaged weights requires 0{nM) work). In pra ctice to 



initialize Y variables at each time Step of Eq. ( III. 11 



^ k,l.O ~ fc./-l 



we set. 



This choice results in faster equilibration 
of the Y' process. 

While the details of the multiscale integration scheme do 
depend on the particular Markov process under study, algo- 
rithm |III.3| can be applied in the same form to a wide variety 
of multiscale Markov processes (for example the system in 
section |IV-B| |. 

IV. Numerical results 

A. A stochastic differential equation 

We present a simple numerical example which demonstrates 
the variance reduction obtained through our algorithm. Con- 
sider the system given by, 

dX't = (r/ - (X'tf) dt + dUt X^ - Af{0, 1) 

dy/ = - - (r/)2) dt+^ dVt y^ ^ i). 

e V ^ 

(IV. 1) 

The parameter e in this example is set to 10^'*. A trajectory of 
the system is shown in Figure [T] The ergodic measure of the 
fast dynamics for this system is known and has the bimodal 
density 

i^x{y) = 



dy 



7 



2 3 




7 e 9 10 



= 0. • • ^.K! • • 









01 23456789 10 







Fig. 1. Trajectory of system jlV. 1 



Fig. 3. Trajectories of and 



and 




Fig. 2. The density for x = 1. 



A plot of fix for X = 1 is shown in Figure |2] 
The observations are given by, 



Xk, 



where x independent Gaussian random variables with 
mean and standard deviation 0. 1 . In the experiment below we 
take as the realization of the observations, Zk, the trajectory 
shown in Figure [T] sampled at every one unit of time. We 
will compare the standard particle filtering algorithm with 
Algorith m | in.3| Both methods are run with 1000 particles. 
System ( |IV. 1 1 is discretized by the Euler-Maruyama method 
with time step 6t = 10^^. The multiscale integration scheme 
uses a time steps of size At = 10^^ to evolve the reduced 
system ( |III.l l| i and of size 6t in the microscopic system 
( |III.12[ ). With this choice of parameters Algorithm |III.3| runs 
in about half of the time of the standard particle filter. 

As in the definitions of the particle filters above, for any 
function / define 



W2f{k) = 

1 



n 10' 



ll04 



}^ ^ f{X',\Yl,^Jg{X',^,Yl,^„^,z,). 



In Figure |3] we compare the pairs of estimators. 



and 
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Notice that the poor quality of the reconstruction of is 
not due to an error The symmetry of the observation model 
and of the dynamics, implies that, in the limit e ^ 0, the 
true condition expectation of Xf, given any observations of 
Y^ alone, will be identically zero. Therefore, both estimators 
appear to be accurate. 

In order to compare the quality of the samples generated 
by the two methods we compute the effective sample sizes of 
the empirical measures produced by both methods, 

essm=. 'Z,. and e...(fc) '''' 



l + Cfik) 



l + Ci{k) 



where 



and 



C2{k) 



M^il(fc)^ 



w n 

^Y.[g{Xf,Y^'\z,)-W,Hk)y 



W2l{k) V nl04 



\l^l^[9iKiil),Yl,^^,z,)-W2Hk) 

\ j = l m=0 

The effective sample size is a common measure of the quality 
of a weighted empirical measure produced by an importance 
sampling scheme (see |39|). Very roughly, the effective sample 
size gives the number of independent samples from the target 
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Fig. 4. Trajectories of essi and ess2. 



measure that would produce an unweighted empirical measure 
of equal quality. When all of the weights are equal (i.e. the 
samples are drawn from the target measure itself) the effective 
sample size is the total number of samples (in our case 1000). 

The trajectories of essi and ess2 are plotted in Figure]?] As 
can be seen in the plot, the effective sample sizes generated by 



Algorithm III. 3 are nearly 10 times as large as those generated 



by the standard particle filter This indicates that there is some 
improvement in the quality of the empirical measure generated 
by Algorithm jnO] 

It is important to note while the cost of the two methods 
in this experiment are comparable, in the limit e ^ a 
discretization of the system ( |I V. 1 | i would require smaller and 
smaller time steps. Thus the computational advantage for the 
multiscale particle filter would become extreme in this limit. 
The next example features a larger time scale separation and 
a correspondingly larger gain in efficiency due to multiscale 
integration. 

B. A pure jump type Markov process 

We now demonstrate our algorithm with a numerical ex- 
ample motivated by chemical reactions. The stochastic dy- 
namical behavior of a well stirred mixture of N molecular 
species that chemically interact through M reaction channels 
is accurately described by the chemical master equation. The 
master equation is usually simulated using the Stochastic 
Simulation Algorithm of Gillespie |40|. In cellular systems 
where the small number of molecules of a few reactant species 
sometimes necessitates a stochastic description of the system's 
temporal behavior, chemical reactions often take place on 
vastly different time scales. An exact stochastic simulation 
of such a system will necessarily spend most of its time 
simulating the more numerous fast reaction events. 

For N molecular species the state of the system S = 
(S'i(t), . . . , SN{t)) is the number of molecules of each species 
present at time t. The molecular populations Si ,i ~ 1, . . . , N 
are random variables. For each reaction channel Rj,j = 
1,...,M we define the propensity function aj{S) and the 
state change vector vj. The propensity function is such that 
aj{S)dt is the probability given S{t) = S that one Rj reaction 
will occur in the next infinitesimal time interval [t,t + dt]. 



The state change vector Vj is the change in the number of Si 
molecules produced by one Rj reaction. 

The pathwise evolution law for the master equation is a 
jump type Markov process on the non negative A^-dimensional 
integer lattice given by 



\ dSN 



I wi.i 
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/ dPi(ai(S)) 
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(IV.2) 



where Pj is a Poisson process with state dependent intensity 
parameter aj(S). 

In our system we choose = 6 species and M = 5 reaction 
channels. Variables 5*1, . . . , 5*5 are the fast variables and 5*6 is 
the slow variable. For the evolution of the fast variables we 
choose a simple fast biomolecular (reversible) reaction 



^1 

^3 



^2 
5*6 



*fc2 Si 



S2 + Sq 



and a fast (reversible) dimerization 



5*4 
^5 



5*4 
5*6 



*fc4 Si ' 



Si + 5*6 



The propensity functions are given by 

/ ai(S) \ / kiSiS2 



02(8) 
03(8 

V «4(S) J 

where the reaction rates fci , . 
will use the shorthand 



k2SiSQ 
hSi{SA - 1) 
kiS^sS^ 



. , fcs are specified below. We 



Sf — {Si, . . . , 55) . 

For the slow variable Sq we choose an external source (spon- 
taneous creation) 

S3 + 5*4 ^fej; + S'4 + 5*6, 

which is coupled to the fast variables through the slow 
reaction's propensity function 

05(8) = k^SsSi. 

The state change vectors for the system just described are 
given by the matrix, 

/ -1, 1, 0, 0, \ 
-1, 1, 0, 0, 
1,-1, 0, 0, 
0, 0,-2, 2, 
0, 0, 1,-1, 

V 0, 0, 0, 0, 1 J 

Finally the reaction constants vector is given by 

{ki, . . . , fcs) = (1000, 1000, 1000, 1000, 5 x lO"'^) 
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Fig. 5. A trajectory of the jump process with initial condition 
(700,700, 1300,900, 1050,400). 



A trajectory of this system is plotted in Figure |5] One can 
clearly see the separation in time scale between the slow and 
fast variables. By examination of the propensities, a^, above 
one can see that the Sf variables evolve on a time scale of 
roughly 10^^ while 5*6 evolves on a time scale of roughly 

10-2. 

The initial populations, Si{0), . . . , Se{0) are drawn accord- 
ing to 

(5i(0),...,S'6(0)) = (700,700, 1300, 900, 1050,400) + ?? 

where 77 is a vector of independent uniformly distributed 
random variables in the interval [—10,10]. The observations 
are taken every 1 unit of time from time to time 10 and are 
modeled by 

Z{1) = S{1) + x{l) 

where the x(0 independent vectors of independent, mean 
0, standard deviation 5, Gaussian random variables. While this 
noise model is somewhat arbitrary, it can be considered to 
model measurement error. In our experiments we take z{l) to 
be equal to the particular trajectory shown in Figure [5] at time 
? for ; = 0, . . . , 10. 

Because the standard particle filter is too expensive to run 
(almost 1000 times the cost of the multiscale particle filter) it is 
not available for comparison. However, we can still investigate 
the variance reduction aspect of the method by comparing 
two filters that both use the multiscale integration scheme 
during the prediction step, but that handle the update step 
in different ways. These two procedures are described in the 
next paragraph. For the details of the implementation of the 
multiscale integration scheme applied here please see 1121 . 

nia. 

The two particle filters applied here correspond roughly 



to algorithms III.l and III.2 In the first, at iteration I, we 



apply the multiscale integration scheme during the prediction 
step (thereby generating an approximate sample, S'q{1) from 
Qas) then we choose one sample 5"^ approximately from the 
measure /i^/j j;-) and proceed as in the rest of algorithm |III. 1 



The latter sampling step is accomplished by sampling the enc 
point from a long (10^* time units) equilibrated trajectory of 



a. 




Fig. 6. a. Trajectory of estimate e^. b. Trajectory of estimate e^. In both 
figures the observations are overlaid with diamonds connected by dashed lines. 



the fast dynamics with the value S'g{l) set as a parameter The 
second method (corresponding to |III.2| i proceeds in exactly 
the same way except that in the second step we sample 10"* 
points at equal time intervals, from a long (again 10^"* 



time units) equilibrated trajectory of the fast dynamics with 
the value S'q{1) set as a parameter The points 5"^ are then 
used to approximate the two averages appearing in Step 3 of 



algorithm III.2 With the severe scale separation present in this 
problem, the variance of the first method just described (no 
likelihood weight averaging) is an accurate representation of 
the variance of the standard particle filter The increased cost 
of the averaging in the second method is negligible. 

Both particle filters are tested with 1000 particles. The 
resulting estimators. 
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where, for any function /, 
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of Si are plotted in Figure |6] along with the hidden trajectory 
of 5*6. Clearly both methods accurately reconstruct the path 
of 5*. Upon close inspection one can see that the estimate 
is slightly closer to the actual trajectory (which is the same 
as the observations in this example). However Figure [6] is by 
no means conclusive evidence that that provides a better 
estimate. 
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Fig. 7. Trajectories of essi and ess2- 



separation is increased. We are not aware of any competitive 
alternative with these features. 

With minor modifications our particle filter can be applied in 
a slightly more general setting. Occasionally one is interested 
in multiscale systems for which it is impossible to explicitly 
fix the slow variable during evolution. For example, one may 
not know the laws governing the evolution of the system but 
can generate sample trajectories (by laboratory experiment). 
In these cases our particle filter remains valid. Roughly, the 
fact that the slow variable does not deviate much on the time 
scale of the fast variables allows one to replace evolution of 
the fast dynamics by evolution of the full system (see Ii4rj ). 
Note also, that the variance reduction technique discussed here 
applies to any importance sampling problem for a multiscale 
Markov process. 



The gain in efficiency of the empirical measure generated 
using the averaged weights does become clear when we 
compare the effective sample sizes of the empirical measures 
produced by both methods, 

1000 . 1000 
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The trajectories of essi and ess2 are plotted in Figure |7] As 
can be seen in the plot, the effective sample sizes generated 
by the particle filter with averaging are as much as 100 times 
greater than those generated by the particle filter without the 
averaging step. This indicates that the improvement in the 
quality of the empirical measure generated by the particle filter 
with likelihood weight averaging is significant. It is also clear 
that, due to the large time scale separation in this system, 
any filtering method that does not explicitly address this in 
the prediction stage of the filter (by, for example, multiscale 
integration) will be extremely slow. 

V. Conclusions 

We have presented an algorithm that combines dimensional 
reduction and approximate Rao-Blackwellization to create 
an efficient reduced variance particle filter for systems that 
exhibit time scale separation. We tested the algorithm on two 
systems with large time scale separations and the results are 
encouraging. 

Our method does not require that any of the distributions 
involved are nearly Gaussian or degenerate. Furthermore, the 
cost of our method does not increase as the time scale 
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